We study adaptive data-dependent dimensionality reduction in the context ofsupervised learning in general metric spaces. Our main statistical contributionis a generalization bound for Lipschitz functions in metric spaces that aredoubling, or nearly doubling. On the algorithmic front, we describe an analogueof PCA for metric spaces: namely an efficient procedure that approximates thedata's intrinsic dimension, which is often much lower than the ambientdimension. Our approach thus leverages the dual benefits of low dimensionality:(1) more efficient algorithms, e.g., for proximity search, and (2) moreoptimistic generalization bounds.
展开▼